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ABSTRACT 

We present a method called TimeMachine to generate a time¬ 
line of events and relations for entities in a knowledge base. For 
example for an actor, such a timeline should show the most impor¬ 
tant professional and personal milestones and relationships such as 
works, awards, collaborations, and family relationships. We de¬ 
velop three orthogonal timeline quality criteria that an ideal time¬ 
line should satisfy: (1) it shows events that are relevant to the en¬ 
tity; (2) it shows events that are temporally diverse, so they dis¬ 
tribute along the time axis, avoiding visual crowding and allowing 
for easy user interaction, such as zooming in and out; and (3) it 
shows events that are content diverse, so they contain many differ¬ 
ent types of events (e.g., for an actor, it should show movies and 
marriages and awards, not just movies). We present an algorithm 
to generate such timelines for a given time period and screen size, 
based on submodular optimization and web-co-occurrence statis¬ 
tics with provable performance guarantees. A series of user stud¬ 
ies using Mechanical Turk shows that all three quality criteria are 
crucial to produce quality timelines and that our algorithm signifi¬ 
cantly outperforms various baseline and state-of-the-art methods. 

Categories and Subject Descriptors: Fl.2.8 [Database Manage¬ 
ment]: Database applications— Data mining 
General Terms: Algorithms, Experimentation. 

Keywords: Summarization, Timeline, Knowledge Base, Submod¬ 
ular Optimization. 

1. INTRODUCTION 

As the web and other technological advancements continue to 
bring down barriers for creation and distribution of information, 
relevant information is often buried in an avalanche of data, and 
locating it has become increasingly difficult (^. Search engines 
have attempted to address this challenge Q, but the volume and 
diversity of results can still be overwhelming, even for simple en¬ 
tity queries |3T). In many such cases, for instance when searching 
for a person or organization, an overview of the most important 
events in an organized and readable format would serve users bet¬ 
ter, ideally with interactive features to enable further exploration. 
A timeline with clickable key events arranged along a horizontal 
time axis would serve this need |39| . 
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Automatically generating timelines is very challenging. To be 
specific, consider creating a timeline for the American actor Robert 
Downey Jr. There are hundreds of possible candidate events and it 
is infeasible to display all of them. Robert Downey Jr. is best 
known for his starring roles in the movies Iron Man and Avengers, 
but even for a single movie there are dozens of related events to dis¬ 
play (production, release dates, opening, and award ceremonies). 
In fact, one should not only focus on movies but provide a more 
holistic overview of his life and career. This could include showing 
various family relationships {e.g., father Robert Downey Sr., ex- 
wife Deborah Falconer, or wife Susan Downey), important acting 
roles for his career (the movie Chaplin and TV show Ally McBeal), 
and other notable works and professional relationships. However, 
note that events might be related as well — if one includes a movie 
award one might not want to display its release date separately 
but rather show a more diverse event instead. Lastly, the timeline 
should be interactive to enable further exploration. 

Knowledge bases (KB) of timestamped facts such as Freebase 
or YAGO |35| have been used as the source of event information 
(in this paper we use Freebase). Previous work has introduced 
timeline generation from KBs through visualizing entity-level co¬ 
occurrence in news corpora |26| , displaying events associated with 
an entity in YAGO (40| , and generating context-aware timelines 
from Wikipedia |39| . However, these works did not address the 
problem of selecting a subset of events but instead displayed all 
events |26||40| , or have used a static global ranking that does not 
capture dependencies between events and is therefore unable to en¬ 
courage diversity |39| . Furthermore, this existing work has not con¬ 
sidered challenges raised by enabling user interaction nor provided 
an empirical evaluation of the quality of the generated timelines. 

Present work. In this paper, we develop an approach called Time- 
Machine to generate a timeline for a given entity of interest. We 
develop three orthogonal timeline quality criteria: 

1. Relevance: Display only the most “interesting” or “relevant” 
events in an entity’s history. 

2. Temporal Diversity: Distribute events evenly along the tem¬ 
poral axis, to avoid visual crowding, and to allow easy inter¬ 
action with the depicted events. 

3. Content Diversity: Display a diverse set of event types (e.g., 
for an actor, do not only list the movies they have been in). 

Consequently, we propose a principled solution to timeline genera¬ 
tion according to these criteria based on submodular optimization, 
for which we both provide theoretical performance guarantees and 
show empirical evidence of significant improvement over baseline 
and state-of-the-art methods. 

In Figure [T] we show that our approach successfully generates 
a timeline of relevant events that is diverse both in terms of con¬ 
tent and time. This timeline is also interactive in three ways. First, 
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Figure 1: An example timeline for Robert Downey Jr. (American actor) generated by our proposed approach. Note that the timeline 
is interactive and displays explanations for each event on hover (see Figure]^. Furthermore, it can be dynamically zoomed. 


when the user hovers over an image, we show various details, such 
as “Robert Downey Jr. starred in The Avengers, released on April 
11, 2012”. Second, the user can specify a particular time period, 
and a new timeline for that period will replace the current one. 
For example. Figure shows the timeline for Robert Downey Jr. 
from 2007 until 2014 and gives an example event description. The 
zoomed-in version focuses attention on more recent events, such as 
his award for Tropic Thunder and his role in the Sherlock Holmes 
movies. Finally, the user can click on an entity icon, such as Susan 
Downey, and a timeline for this entity will be displayed. 

Our approach involves the following two main steps, which are 
sketched in Figure]^ First, given a subject entity of interest, we 
generate as many candidate events as possible by searching for 
neighboring entities with timestamps in the given knowledge base 
(Section]^. We generate candidate events for all possibly interest¬ 
ing subjects offline. Second, given a set of candidates and a time 
period of interest, we select (online) a diverse subset of the most 
relevant events subject to temporal diversity or layout constraints 
(Section|^. To do this, we maximize a submodular objective using 
various relevance signals based on web co-occurrence, subject to 
these layout constraints. We prove that our greedy algorithm for 
optimization yields close-to-optimal solutions. In addition, our al¬ 
gorithm allows for fast dynamic updates of the timeline based on 
user interaction (zooming in or out). 

We evaluate our proposed algorithm through a series of user 
studies with 1154 raters and compare it to various simpler base¬ 
lines and state-of-the-art approaches (Section]^. Our experiments 
show that users always significantly prefer our proposed method 
(60-91% of timeline comparisons). Further, we demonstrate that 
enforcing temporal diversity and content diversity significantly im¬ 
proves the results. 

Jn summary, our main contributions are as follows: 

1. A design of a timeline search engine that efficiently supports 
various types of user interaction. 

2. An algorithm for generating entity timelines based on sub- 
modular optimization and web-co-occurrence scores. 

3. An extensive user study of the relative importance of dif¬ 
ferent signals for determining entity relevance and different 
notions of diversity. 

2. EVENT CANDIDATE GENERATION 

Recall from Figure]^ that there are two main steps: candidate 
event generation and event selection. In this section, we describe 
how we generate candidate events given a subject of interest. Our 
approach is to generate a large set of events, and then to filter out 
“irrelevant” ones. We give an evaluation of this filtering step. This 
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Figure 2: System architecture. TimeMachine traverses the 
KB offline to generate candidate events for a subject of interest 
(e.g., Robert Downey Jr.). At run time, the user specifies a time 
period of interest and TimeMachine selects a subset of events 
from the candidates to generate the timeline. 
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Figure 3: An illustration of the event candidate generation step. 
Events are short paths that are associated with a timestamp. 

is a necessary preparation step for our key contribution in this paper 
(described in the following section): dynamically selecting a subset 
of the remaining events at runtime, depending on the time span of 
interest and the available screen real estate. 


2.1 Event Generation 

We can consider the KB as a graph with nodes representing sub¬ 
jects and objects, and edges representing the relationship (predi¬ 
cate) between the nodes. Given a particular subject represented by 
a node Ns in the KB, we are interested in nodes that are connected 
to Ns through some paths and are associated with a timestamp; we 
call such paths “events”. As we discuss below, we consider two 
kinds of events: simple and compound. Figure illustrates the 
overall process for Robert Downey Jr. 

Simple events. Simple events are nodes with timestamps that can 
be reached by paths of length one or two starting at Ns- In the 
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description for the Sherlock Holmes event (award nomination) that gets displayed on hover. 


example in Figure]^ we can traverse along an edge of type date-of- 
birth, and reach a node representing the corresponding date; this is 
a path of length one (1-hop event). We can also traverse along an 
edge of type starred-in, and reach the node representing the movie 
The Avengers. To find the corresponding date, we traverse along a 
second edge of type release-date and reach a node with the release 
date of the movie. This is a path of length two. 

We can formally represent a simple event (derived from a path 
of length two) as follows: s rei t, where s is the subject, 
rei is a related entity (such as The Avengers), t is the timestamp, 
and Pi and p 2 are the predicates along the path. For simplicity, we 
represent 1-hop events in a similar way, by introducing a self-loop 
through Pi = self and rei = s. For each event e, we define the 
subject of the event as SUB(e) = s, the related entity as RE(e) = 
rei, the timestamp as r(e) = t, the entity path as Ttreie) = pi, 
and the time path as nt{e) = pi.p 2 . 


Compound events. Extracting only simple events will miss out on 
some implicit connections to other related entities. For example, 
consider the collaboration between Robert Downey Jr. and Samuel 
L Jackson in the movie The Avengers shown in Figure Even 
though there is no direct edge between Robert Downey Jr. and 
Samuel L Jackson, they are connected through a path as they starred 
in the same movie. 

We can discover such connections as follows. Suppose we have 

Pi P2 P3 P2 

simple events si —rei —> t and S 2 —rei — t, which 
share the same related entity and timestamp, but differ in their first 
hops. We join these events to generate a new compound event e = 


Sl“ 

S2“ 


, rei 


t. We treat S 2 as another related entity re 2 for si. 


and vice versa; in other words, RE(e) = re 2 = S 2 , and nre{e) = 
pi.pa. Eor a discussion of implementation details please refer to 
the appendix of the full version of this paper Q. 


Event descriptions. To ensure an event can be understood by an 
end-user when they hover over the corresponding box on the time¬ 
line, we have to convert these paths into natural language form. 
We do this by manually defining some templates for the 100 most 
frequently occurring paths (see Figure]^ for an example). For the 
remaining paths, we simply concatenate the English names of the 
corresponding predicates and entities. 


2.2 Event Filtering 

The event generation steps we have just described may generate 
some irrelevant events. For example, it can discover a path “na¬ 
tionality —> date founded”, so everyone with nationality USA has 
a candidate event with timestamp July 4, 1776, the date on which 
the USA was founded. However, arguably this event is irrelevant 


to most people, since it is not specific to them, and it occurred well 
before many of them were bomnWe propose two simple heuristics 
that capture these intuitions and filter out many irrelevant events. 

The Frequency Filter uses the concept of inverse document fre¬ 
quency Q from the IR community. The idea is that an event that 
is commonly associated with a large number of subjects is unlikely 
to be particularly interesting. To formalize this, consider the set of 
all events {s,re,TTt,t). het N{nt,re,t) be the number of subjects 
that are connected to re and t through path ttj, and let N{TTt) be 
the number of distinct (re, t) pairs that are connected to any subject 
viaTTf. Furthermore, let (^(Trt) = \{{re,t) : N{'Kt,re,t) > 0i}| 
be the number of (re, t) pairs for which there are more than 0 i 
subjects connected through path ttt. Then for any given path itt, if 
Cijit)/Niyitt) > 02 , where 62 is some threshold, we drop that path 
for all subjects. Note that this will generalize across entities. For 
example, discovering that “nationality —>■ date founded” is a irrele¬ 
vant path based on people bom in some countries allows us to drop 
instances of this path also for people bom in all other countries. 

Further, entities in a KB are naturally associated with a period 
of existence', individuals are bom and pass away, companies get 
founded and go out of business, and musical groups get formed and 
split up. The second filter. Existence Filter, filters out events that 
occurred before an entity began to exist. If we find that a particular 
kind of path is filtered out for a large fraction (say more than 63 ) 
of entities, we filter the path out for all entities. A canonical exam¬ 
ple is “parent —> date of birth” which obviously occurs before the 
subject entity is bom (i.e., 63 = 100%). Based on our experiments 
(discussed next), we chose 61 = 50 and 62 = O 3 = 0.5, and we 
observed that slightly varying the parameters had very little impact 
on the results. 

2.3 Evaluation of Event Filtering 

We used Freebase to generate candidate events for four types 
of entities: music artists, actors, politicians, and athletes. We gen¬ 
erated candidate events for all entities of these types in Freebase 
and evaluated the quality of the results. 

We evaluated the quality of our filtering using true positive / neg¬ 
ative rate metrics as follows. First, for each filtering heuristic, we 
estimate the fraction of filtered paths that were correctly filtered 
(i.e., judged irrelevant by a human) or the true negative rate. Sec¬ 
ond, we estimate the fraction of non-filtered paths that are correctly 
not filtered (i.e., judged relevant by a human) or the tme positive 
rate. For each metric, we evaluated the top 100 most frequent path 
types covering over 90% of all generated event instances (out of 

' Even for George Washington, a founding father of the USA, it is 
safe to eliminate the “nationality —> date founded” path, as there 
are other paths connecting him to the USA and its foundation date. 















































Figure 5: Log-log coverage plot showing the number of entities 
with X or more candidate events and illustrating the impact of 
adding compound events (dashed line). 

5269 different path types generated in total). Two domain experts 
manually judged each path as relevant or irrelevant. 

We observe that the Frequency Filter is 84% correct {i.e., it ac¬ 
cidentally filters out only 16% of the relevant paths), and the Exis¬ 
tence Filter is 100% correct (i.e., everything it filters out is irrele¬ 
vant). The main failure case for the Frequency Filter consisted of 
relevant events involving many entities, such as large award cer¬ 
emonies or military conflicts. We also observe that among the 
paths that pass both filters, 87% are correct. The main failure case 
are birthdates of related people (e.g., members of the same band), 
which are arguably irrelevant to the subject. 

In addition to high correctness, we need the event generation 
phase to have high coverage. Figurej^plots the number of events on 
the X-axis versus the number of entities for which we extracted at 
least this many events on the Y-axis (after filtering). Suppose we re¬ 
quire at least 100 candidate events for an entity before we consider 
it to be “history rich” enough for us to generate its timeline. The fig¬ 
ure shows that we can generate timelines for 12k entities if we use 
simple events, and for 64k entities if we use compound events (see 
Section[2T](. This shows that Freebase has a sufficiently rich set of 
events to make our approach possible, even though it is incomplete 
in many other ways GD- With the advent of systems for automated 
knowledge-base population such as Knowledge Vault |12| , we can 
expect the coverage to improve further in the future. 

3. EVENT SELECTION 

We showed in the previous section that a given entity may have 
hundreds of candidate events associated with it. In this section, we 
discuss our main contribution, a way of selecting a small subset of 
events to be shown on the timeline, given the time span of interest 
and a specified amount of space on the screen. 

Our approach will be based on submodular optimization, which 
we explain in general terms in Section [jT] (followin g (7)[20| ). We 
define our specific optimization problem in Section and give 
details in Section |3.3| and Section |3.4| In Section |T5l we de¬ 
scribe our efficient approximation algorithm, which we prove in 
Sect ion |3.6| to yield close-to-optimal solutions. Finally in Sec¬ 
tion [T7r^ discuss how our algorithm enables user interactions, 
such as zooming in (or out) on the timeline. 

3.1 Submodular Function Maximization 

Suppose we have a set (e.g., events) denoted by X, and an eval¬ 
uation function for sets / : 2^ —>■ R. Let /s(e) = f{S U {e}) — 
f{S) be the marginal gain of adding element e G X to set S. 

A function / : 2^ —>■ R is submodular if for every pair of 
subsets A (- B C- X and element e G X \ B we have /A(e) > 
/s(e). Intuitively this means that the benefit of adding element 
e to the smaller set A is bigger than adding it to the bigger set 
B, so / exhibits the property of diminishing returns. We restrict 
our attention to monotone functions; that is, f(A) < f{B) for all 
A (- B. We also assume /(0) = 0; that is, / is non-negative. 


Constraints. We want to compute maxgcx f{S) subject to some 
constraints on S. A common constraint is on the size or cardinality 
of S |20| . However, in our case, we have more complex constraints, 
related to temporal diversity and overlap. To formalize these con¬ 
straints, we need the notion of an independence family, defined as 
follows. An independence family I C 2^ is a family of subsets 
that is downward closed; that is, A G T and B (- A implies that 
B G T. A set A is called independent if A G T. Popular indepen¬ 
dence families include matroids and intersection of matroids 0- 
As an example, given X = {a, b, c}, an independence family is as 
follows: I — {0, {a}, {b}, {c}, {a, c}}. 

p-system. For a set X C X, a set J is called a base of X if J 
is a maximal independent subset of X; in other words J G T and 
for each e G X \ J, we have J U {e} ^ T. Note that X may 
have multiple bases, and further, a base of X may not be a base 
of a superset of X. In our example of X and X, in the case of 
X = X, the bases are {b} and {a, c} (since X does not include 
{a, 6}, {b, c}, or {a, b, c}). 

We will now use this concept to define a more general notion 
of independence families parametrized by an integer p, as follows. 
(X, X) is said to be a p-system if for each X C X, the cardinality 
of the largest base of X is at most p times the cardinality of the 
smallest base of X: 

maxj; J is a base of V I 7^1 ^ 

- rjr^P- ( 1 ) 

min J: J is a base of Y \J\ 

To continue with our example of X, X, X, there are two bases and 
p = |{a, c}|/|{6}| = 2. For all other choices of X C X, we have 
p = 1. Thus, (X,X) is a 2-system. The notion of p-systems will 
be useful in Section [3^ to prove certain approximation guarantees. 
Advantages. Before continuing, it is worth discussing why we are 
using submodular optimization. A simple alternative would be to 
just score each event independently, sort the events by score, and re¬ 
turn the top events (as long as they fit into the timeline). Such static 
rankings are insufficient for our purposes since they do not con¬ 
sider diversity. For instance, even though the movie The Avengers 
is very relevant to Robert Downey Jr., the timeline should not solely 
consist of The Avengers events (filming, production, release date, 
award ceremonies, sequels, etc.). Furthermore, we want diversity 
in the temporal spacing of the events (we show in Section that 
users strongly prefer diverse timelines) which depends on the time 
period or zoom factor chosen by the user. 

The best way to capture these effects is to reason about the en¬ 
tire set of events that should make up the timeline. A submodular 
set function allows for exactly that, and is able to encourage dif¬ 
ferent notions of diversity, as we show in Section [T3] In addition, 
there are computational benefits to using monotone submodular set 
functions. In particular, as we show in Section [T^ we can devise a 
greedy algorithm that enjoys certain optimality guarantees. 

3.2 Timeline Optimization Problem 

We now formalize our problem. Let B C £1 be the set of all can¬ 
didate events for a particular subject entity s constrained to events 
within the user-specified or default time span. We will display each 
event as a small box of width w and height h, as shown in Figure[T] 
Assume we have available screen space of width W and height B. 
Let n = \H/h\ be the number of boxes that can be stacked verti¬ 
cally within T-L. Our goal is to find the optimal timeline T*, which 
we define as follows: 

T* = arg maxREL(s, T) (2) 

TCE 

S.t. CONSTRAINTS(r, E, W, W, n) 





The relevance function Rel evaluates how relevant each event is to 
the subject and how diverse they are; this is described in more detail 
in Section [T3l The temporal constraint CONSTRAINTS requires 
that all events can fit into the provided space without overlapping or 
occluding each other; this is described in more detail in Section [T4l 

Note also that our algorithm is able to adapt to different form 
factors, for example for mobile or desktop, since height and width 
are just parameters to the optimization algorithm. 

3.3 Relevance Function 

The function Rel(s, T) captures the quality of the selected sub¬ 
set of events T with respect to the timeline subject s. This is de¬ 
fined as a linear combination of two different kinds of relevance 
functions: 

Rel(s, T) = X ERel(s, T) -I- (1 - A) DRel(s, T) 

where 0 < A < 1 trades off the importance of related entities 
(ERel) versus the importance of related dates (DREL)|^We define 
these terms next. 


3.3.1 Entity Relevance 

We define the relevance of a set of events T to an entity s as 
follows: 


ERel(s, T) = E2E(s, T) -f E2EPATH(r) -f wl G2E(r) 

where E2E measures how relevant the specific events are, E2EPATH 
measures how relevant the paths are, and G2E measures how rel¬ 
evant the events are globally (i.e., independent of s). As we show 
shortly, we combine E2E with E2EPATH and G2E to handle data 
sparsity, cf., backoff-smoothing jTpl. This enables inductive rea¬ 
soning that certain relationships hold generally when we only see 
a few examples of them. Eor example, on average, movie roles 
are more relevant to actors than TV episode roles (E2EPATH). We 
discuss how we set the weight parameters in Section]^ 

In more detail, we measure the entity-to-entity score as follows: 

E2E(s,r)= E2ECooc(s,re) 

reG{RE(e) | e^T} 

where E2ECooc(s, re) measures co-occurrence of the entities, 
and is defined in Section l3.3.3l 

Since a path from a subject to a specific entity may occur too 
rarely to be reliably estimated, we also consider measuring how 
good the path is, by averaging the co-occurrence score over all en¬ 
tities that can be reached via all the paths in the timeline: 


E2EPATH(r) 


E 

pG{'7Vre (e) I eST} 


mean E2ECooc(SUB(e), RE(e)) 

e£S, 7Vre{e)—p 


Finally, since even the E2EPATH signal may be too sparse to re¬ 
liably estimate, we consider G2E, which estimates the global im¬ 
portance of each entity in the timeline: 


G2E(r) = "Y GLOBALlMPORTANCE(re) 

ree{RE(e) | eST} 


We estimate GLOBALlMPORTANCE(re) as the fraction of search 
queries that mention the entity re, measured from a 3-month query 
log (though other measures of global importance could be used in¬ 
stead). Inferring the entity mentioned in a query is done using a 
proprietary system that applies standard entity linkage algorithms 
(such as |32|) to the landing page of the query. 


^ We set A = 0.75 throughout, based on preliminary experiments 
on a holdout set showing that users slightly prefer entity relevance 
to date relevance. 


All three functions, E2E, E2EPATH, and G2E, are weighted 
coverage functions defined over a set rather than a multiset of re¬ 
lated entities (or paths to related entities). As such, they natu¬ 
rally favor content diversity as duplicate entities or paths are only 
counted once. 


3.3.2 Date Relevance 

We define the relevance of a set of dates as follows: 

DRel(s, T) = wi E2D(s, T) -f wt E2DPath(T) 

The functions E2D and E2DPATH are defined in a very similar 
way to their E2E counterparts. For specific dates we have 

E2D(s,T)= Y E2DCooc(s,f) 

te{T(e) I e€T} 

Recall that r(e) is the timestamp of event e, so E2DCOOC mea¬ 
sures how often an entity and date co-occur, as explained below. 
Then for the path level we have 

E2DPATH(r) 

= Y^ mean E2DCooc(SUB(e), r(e)) 

/ \ TTt (6)=p 

pe{7rt(e) | esT} ^ 

Similarly, we again use a set instead of a multiset for the time- 
stamps and time paths to favor temporal diversity. 


3.3.3 Web-based Co-occurrence Scores 


We use co-occurrence signals between entities on the web to cap¬ 
ture how related two entities are (E2ECOOC). Similarly, we use co¬ 
occurrence between entities and dates (E2DCOOC) to capture how 
related a particular date is to an entity. We compute these quantities 
as follows: 

1. We run a suite of standard NLP tools (named entity recog¬ 
nition, coreference resolution, etc.) over a large corpus of 
lOB web documents using a set of in-house tools, similar 
to the Stanford CoreNLP package]^ We extract entity men¬ 
tions (which are resolved to Freebase IDs) and date mentions 
(both year and full date if available), using techniques similar 
to those described in |32| . 

2. For each entity mention, we collect all entity-entity and entity- 
date co-occurrences within a small window around the men¬ 
tion (window of 100 characters or 10-12 words on average). 

3. We count these co-occurrences, and convert to probabilities 
(by normalizing the counts). We define the co-occurrence 
scores using normalized pointwise mutual information 
(NPMI) as follows: 

E2ECooc(s,re) = NPMI(s;re) = ™I(s; re) 

-logp{s,re) 


PMI(s; re) = log 


P{s,re) 

p{s)p{re) 


E2DCOOC is defined exactly like E2ECOOC with the only differ¬ 
ence that a timestamp t is substituted for entity re. 

PMI measures the difference between the co-occurrence prob¬ 
ability and the probability expected by chance if the events were 
independent. It is critical to account for particularly popular en¬ 
tities (e.g., Barack Obama) or dates (e.g., 2014), and dividing the 
co-occurrence probability by the popularity of the co-occurring en¬ 
tities/dates is a principled way of achieving this. 

NPMI normalizes PMI to the range [—1,1]. Since we are only 
interested in the most related pairs of entities (or entity/date pairs), 
we only retain positive NPMI scores. Furthermore, we require that 
this co-occurrence was extracted from at least five different web 
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domains for robustness. Computing co-occurrence statistics from a 
large web corpus (lOB documents) and generating candidate events 
(for over IM entities) takes about six hours using map-reduce. 

3.3.4 Submodularity of Objective Function 

We now show that our objective function is a monotone submod- 
ular set function. 

Theorem 3.1. Let f{T) = Rel(s, T) for any given subject s. 
Then / : 2^ —>■ E,’*' is a monotone submodular set function. 

Proof. First we note that / is a non-negative linear combination 
of weighted coverage functions (since Wi,wfw 3 ,wf,W 2 > 0). 
Each of these are known to be submodular |14||20| , which is easy 
to see, due to the diminishing returns property of weighted cover¬ 
age functions. Furthermore, a non-negative linear combination of 
submodular functions is submodular as well. 

Second, we note that / is also monotone. This holds since all 
individual weighted coverage functions are non-negative (because 
E2ECOOC, E2DCOOC, and GlobalImportance are all non¬ 
negative), so adding up more terms makes the sum bigger. 

Third, we have /(0) = 0, again because all weighted coverage 
functions lead to empty sums. □ 

3.4 Temporal Diversity Constraint 

The layout constraint requires that all events can fit into a time¬ 
line of width W and height TC without overlap. We consider a sim¬ 
ple layout strategy: if the boxes (of width w) depicting two events 
have a temporal overlap, we can stack one on top of the other, as 
shown in FigurefT] but we require that the height of this stack be at 
most n = \ TL/h\. 

We can define this constraint more formally as follows. Recall 
that each event e £ E has a timestamp r(e) G R. Let R be an 
interval R = [a, 6] C E. We denote the set of events in T C S 
with timestamps within R 2 &Tr\R = {e£T\ r(e) € R}. 
We define as the length of a time period that corresponds to 
the width of w on the timeline; can be easily computed accord¬ 
ing to W, w, and the beginning and ending timestamps for E. Fi¬ 
nally, we say a set T C F) of events satisfies the layout constraint 
Constraints(T, E, W, w, n) if 

Vf G E : |rn < n (3) 

This constraint can be interpreted as follows: for any point in time 
f G E, draw a line of width up to starting at t. Consider the 
intersection of all timestamps in T with this line. If the size of 
the intersection is less or equal to n, then we know that we can 
vertically stack the events in the intersection without violating the 
height constraint. 

It turns out that this constraint forms a p-system that enables us 
to prove approximation guarantees (see Section [T^ . 

Theorem 3.2. Let (E, X) be an independence family based on our 
layout constraint where T £ X if T 'X E satisfies Equation 0. 
Then [E,X) forms a p-sy stem for p = 2. 

Proof Sketch. For the full proof please refer to the appendix of the 
full version of this paper Q. The idea is as follows: We need to 
show that I Jmax|/| Jminl < 2 where Jmin and Jmax are minimal 
and maximal bases of an arbitrary subset T <£ E. We can show 
that Jmax can be at most twice as large as Jmin by “deleting” all 
elements from Jmax eventually, where in each step we delete an 
element b £ Jmin, and up to two elements in Jmax (if they exist) in 
close proximity to b. Intuitively this process works because we can 
never have an element in Jmax that we cannot delete in this way 
since then either Jmin has too few points to be a maximal indepen¬ 
dent set (base), or Jmax has too many points to be an independent 
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set (it would violate the layout constraint). Both would contradict 
our assumption that both Jmin and Jmax are bases of y. □ 


3.5 Optimization Algorithm 

Our problem is reduced to the problem of finding a solution T* 
that obtains maxTgi Rel(s, T), where X is the independence fam¬ 
ily that we defined in Section [3^ (see Theorem |3.2[ >. Unfortunately, 
such problems are NP-hard for many classes of submodular func¬ 
tions, including weighted coverage (E) (our case). Therefore we 
focus our attention on efficient algorithms with theoretical approx¬ 
imation guarantees. 

As we show in Section [T^ a greedy algorithm (see Algorithm[TJ 
has certain approximation guarantees. The algorithm incrementally 
builds an approximate solution T (without backtracking), starting 
with the empty set. In each iteration it adds an element e from 
the set of valid candidates C that most improves the current solu¬ 
tion (according to the marginal gain REL^(e)), while maintaining 
independence of the solution (see line 3). 

This greedy algorithm has complexity 0{\E\'^) (assuming com¬ 
puting Rel^(-) takes constant time). In practice, it can be sped up 
significantly in practice by using lazy evaluations, as first proposed 
in (E) (see also dD and Section 2 of |20|). This “lazy greedy” 
algorithm exploits the fact that the marginal gain for each element 
only decreases with each iteration; that is, REL^(e) > Rel^, (e) 
for T C T', and therefore we can use previously computed values 
as upper bounds to save many evaluations of Rel^. We use this 
more efficient implementation of the greedy algorithm. 

3.6 Approximation Guarantee 

Before we can prove our main theoretical result, we introduce 
the following lemma. It is proved in |28| for the special case where 
X is defined by the intersection of p matroids on X, and for the 
more general case of p-systems in Appendix B of 

Lemma 3.3. ^ \28'j The algorithm GreedyTimeline to com- 
pute matKsizi f{S), where {X,X) is ap-system and f : 2^ —> E^ 
is a monotone submodular set function, has a tight approximation 
ratio ofl/{p+ 1). 

We have shown in Theorem l3.1l that Rel is a monotone submod¬ 
ular set function and in Theorem |3.2| that the temporal constraints 
Constraints form ap-system forp = 2. Following Lemma [T^ 
our greedy algorithm has the following approximation bound. 

Theorem 3.4. Algorithm GreedyTimeline has an approxima¬ 
tion ratio of 1/3; that is, 

REL(s,f')/REL(s,r*) > 1/3, 

for any subject s, where T is the output of our algorithm GREEDY¬ 
TIMELINE, and T* is the optimal solution. 

3.7 Zooming in or out of the Timeline 

We have proposed an efficient algorithm for generating time¬ 
lines. This efficiency (in particular, the “lazy greedy” property 






that allows us to re-use previously computed values) enables us 
to quickly (re)compute the optimal timeline if the user chooses to 
dynamically zoom in or out of a specific time period. In practice, 
we observe running times roughly linear in the number of events 
taking a few hundred milliseconds which is much faster than the 
quadratic theoretical worst case bound. An example was given in 
Figure|2 where we show the timeline for the most recent few years 
of Robert Downey Jr.’s life (c/, Figure[TJ. 

The default interval for each timeline is computed as follows: 
we choose the shortest time period that covers at least 90% of all 
generated events (restricted to the lifetime of the entity). Note that 
for person entities, this time period usually corresponds to less than 
90% of their lifetime based on the intuition that most interesting 
events happen to people after they grow up, but before they retire. 
(See Sectionj^for a discussion of when this heuristic can fail.) 


4. EVALUATION 

In this sectiom we evaluate the quality of our method for pro¬ 
ducing timeline^ Since there is no ground truth to compare to, we 
asked Amazon Mechanical Turk raters to vote for their preferred 
timeline. We do this in a series of paired comparisons in which 
we vary one component of the algorithm at a time, resulting in six 
different model^j summarized in Table[^ The results are shown in 
Table 1^ and Figure]^ and are explained shortly. 

In summary, our experiments show that (1) users always sig¬ 
nificantly prefer our full method over baseline and state-of-the-art 
methods; (2) enforcing temporal diversity and content diversity sig¬ 
nificantly improves the results; and (3) both entity relevance and 
date relevance contribute to generating a quality entity timeline. 

4.1 Experimental setup 

We generated timelines for 250 popular entities (75 music artists 
and bands, 75 actors, 50 politicians, 50 athletes) for each of the six 
methods in Table [T] We chose popular entities instead of random 
or tail entities because evaluations cannot be trusted on entities that 
most raters are not at all familiar with. Popular entities also account 
for the major share of the total query volume and their large number 
of candidate events and often long lifespan makes timeline gener¬ 
ation particularly challenging. Furthermore, we chose to evaluate 
timelines through pairwise preferences rather than absolute quality 
judgments as this has often been found to be less subjective and 
thus more reliable (^. 

Let T(e, m) denote the timeline for entity e generated by model 
m; let m = 0 denote the full model (the control), and let m — 
1 : 5 denote one of the ablated models (experimental conditions; 
described in Table and the following sections). For each entity 
e, we displayed the control timeline T(e, 0) and the experimental 
timeline T(e, m) for m > 0, one above the other; we randomized 
the decision whether the experiment or control was shown on top. 

We asked each rater which timeline they preferred, on a 5-point 
scale, corresponding to strongly preferring the top one, slightly 
preferring the top one, being neutral, slightly preferring the bot¬ 
tom one, and strongly preferring the bottom one. We also asked 
each rater to give qualitative comments to justify their decision, to 
gain further insight. Each pair of timelines is rated by five differ¬ 
ent raters (1154 distinct raters in total). We encouraged raters to 
research each entity (e.g., using Wikipedia) before evaluating each 


A demo is available at 

http://cs.Stanford.edu/~althoff/timemachine 
Experiments on varying wi, 102 , its show that the results are in¬ 
sensitive to the exact parameter values as long as wi S> W 2 S> ws 
{cf., backoff-smoothing jl9|, see Section[3.3.1|. 


Name 

wf 

w| 

wi 

Wi 

W2 

TD 

CD 

Full 

1 



1 

10-*^ 

1 

1 

Base 

0 

0 

1 

0 

0 

1 

1 

FULL-E2D 
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1 

1 

FULL-E2E 

0 

0 

10-'* 

1 
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1 
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1 


10-'* 

1 

10-*^ 

0 

1 

Eull-CD 

1 

10^^ 

10-^ 

1 

10-*^ 

1 

0 


Table 1: Summary of experimental configurations. We fix 
A = 0.75 throughout. TD = temporal diversity, CD = content 
diversity. FULL-TD means the full model without temporal 
diversity, etc. Note that all methods remove duplicate events, 
which is a minimal form of content diversity, but if CD=1, we 
ensure diversity amongst types of events (entities and paths) as 
well; see Section[4l5|for details. 


Ablated model 

#Tasks 

#Raters 

RAggr 

RPref 

Base 

1250 

344 

77.0% 

83.8% ”* 

FULL-E2D 

1250 

463 

75.7% 

59.8% ” 

FULL-E2E 

1250 

676 

73.2% 

64.3% ”* 

Full-TD 

150 

53 

75.3% 

86.7% *** 

Full-CD 

1250 

665 

81.0% 

91.1%*** 


p < 0.001, **p < 0.01, *p < 0.05 


Table 2: Summary of the user studies. Each row shows an ab¬ 
lated version that was compared to the full model. The asterisks 
represent the p-value corresponding to a Binomial hypothesis 
test that compares the RPref value to 50%. 


timeline about that entity; fortunately, 79% of raters reported that 
they were already familiar with these entities. 

To simplify the analysis, we collapsed the user votes to a 3-point 
scale: prefer control (full model), neuhal, or prefer experiment (ab¬ 
lated model). Let V (e, m, r) G {F’, T, A} be the vote by rater r G 
R for entity e G E and method m G M, where F represents prefer¬ 
ring the full model, T represents a tie, and A represents preferring 
the ablated model. Let N{e, m,v) = \{r G R ■. V(e, m, r) = t)}| 
be the number of raters who voted for category v G {F, T, A}, for 
entity e, and method m. Let M (e, m) = arg max^ N{e, m, v) be 
the majority vote. 

We compute agreement between raters (RAggr) as the fraction 
of raters agreeing with the majority vote (including tie votes and 
tied majorities): 


RAggr (m) 


|{y(e,m,r) = M{e,m) ■. eG E,r G i?}| 
|{V’(e,m,r) : e G E,r G J?}| 


We define the rater preference (RPref) for the full method as the 
fraction of times the majority of raters vote for the full method, 
excluding cases where there is no clear majority; that is, we set 
M (e, m) = NULL if the majority is not unique (e.g., if we have 2 
votes for F, 2 for A, and 1 for T): 


RPref (m) 


\{M{e,m)=F -.eG E}\ 
\{M{e,m) G{F,A}-.eG E}\ 


(Note that a 5:0 vote in favor of full (5 F, 0 A) is treated the same 
as a 3:2 vote.) If both methods are equally good, we would expect 
both the full and the ablated model to win exactly 50% of the time; 
that is, RPref (m) = 0.5 (our null hypothesis). This allows us to 
use a simple two-sided Binomial hypothesis test of significance. 


4.2 Baseline Algorithms 

In our initial trial, we defined the baseline algorithm as follows: 
rank all the candidate events by the G2E global entity score, and 
then show the top K events (where each event is represented by 
a box on a timeline of width 1000 pixels, and we allow up to 
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Figure 6: Fraction of entities for which raters preferred Full 
approach over ablated version (RPref) along with bootstrapped 
95% confidence intervals. Results show significant preference 
for our proposed approach in all cases. 

n — 2 boxes to be stacked vertically). However, we found that 
the G2E score sometimes picked the same related entity more than 
once (e.g., if Robert Downey Jr. starred in Iron Man and later won 
an award for it, we may display Iron Man twice, at two different 
time points). Users strongly disliked this in preliminary experi¬ 
ments, so we decided to augment the baseline through post-filtering 
all results by removing duplicate entities from all methods (except 
Section [43^ , keeping the highest scoring event in each case (we 
never allow duplicate events). 

In addition, we noticed that the baseline model would sometimes 
result in visual crowding, since it does not enforce temporal diver¬ 
sity. Users strongly disliked this as well, so we decided to further 
improve the baseline by post-processing all results and removing 
temporally overlapping events, keeping the highest scoring event 
in each case. 

In practice, we can implement this modified baseline Base by 
using our constrained submodular optimization algorithm, but set¬ 
ting the weights so that wf = W 2 = wf = wt = 0 and uif = 1, 
thus putting all the emphasis on the G2E signal. This is because 
the greedy optimization algorithm will ensure that it never adds an 
event that temporally overlaps an existing event. Most of the time 
our submodular coverage function does not yield any duplicate en¬ 
tities. In the rare case that the optimization does lead to duplicate 
entities, we explicitly remove them to ensure that entity diversity 
has no impact on any other experiment (except Sec tion [43] that ex¬ 
plicitly measures the importance of content diversity). 

We can see from the results (Table|^and Figure]^ that on aver¬ 
age, 84% of the time raters prefer our full model (significant dif¬ 
ference at p < 0.001 according to a Binomial test). This shows 
that a global relevance score is inadequate, even when augmented 
by temporal diversity and content diversity. 

The only other relevant baseline known to us is the CATE system 
described in |39| . CATE ranks related entities by co-occurrence 
with the timeline entity within documents of a given context. This 
approach is very similar to the E2E approach described in Sec- 
tion |3.3.l] We consider our E2E signal as an improvement over the 
CATE baseline since, first, we only consider more direct connec¬ 
tions between entities in a KB compared to co-occurrence within 
the same document. Second, our relevance signal captures co¬ 
occurrence on a large web corpus within a small window which 
gives higher coverage and more focus compared to document-wide 
co-occurrence within Wikipedia only. And third, we perform sub¬ 
set selection instead of a static ranking which allows selected events 
to influence which other events are selected next. Section l43] shows 
that our methods outperform E2E. 

4.3 Evaluating Relevance Signals 

To compare the different ways of measuring event relevance, we 
performed two experiments. First, we “turned off” the date signal 


DRel, by setting wf = W 2 ~ 0. We call this model Full-E2D, 
meaning the full model without the E2D signal. Raters prefer our 
full model to this version about 60% of the time, which is a signif¬ 
icant difference at the level of p < 0.01 (Binomial test). 

The utility of the date signal depends on which kind of entity we 
are creating a timeline for. For people, it is common to find the 
date of birth, death, marriage or other key events to be explicitly 
mentioned on the web; this makes it relatively easy to determine 
that these events are important]^ 

Second, we “turned off” the ERel signal, by setting wf = W 2 = 
0. We call this model Full-E2E. Raters prefer our full model to this 
about 64% of the time, indicating that the E2E signal is somewhat 
more important than E2D (significant at p < 0.001). 

However, the benefit of the E2E signal varied by domain/vertical: 
we found it most useful for actors and athletes, whereas for musi¬ 
cians, the E2D signal was more helpful. We attribute this to con¬ 
ventions in what entities and dates are co-mentioned on the web (in 
close proximity). E2D works well for music artists because impor¬ 
tant dates such as album release dates and tour dates are frequently 
mentioned across many websites (online stores, ticketing websites, 
etc.). However, this is different in the movie domain. There are 
many more entities related to the movie (director, producer, dozens 
of actors, etc.) and only a few of them will be highlighted in close 
proximity to the movie release date (usually one or two star actors). 
How helpful the E2D signal is depends on what usually gets men¬ 
tioned in close proximity of the date, which is subject to certain 
conventions and marketing decisions. For instance, the first Pirates 
of the Caribbean movie (2003) has a lower E2D score for actor 
Johnny Depp than later sequels even though the first movie was 
the bigger milestone for Johnny Depp’s career. The sequel pro¬ 
motions just featured Johnny Depp (who had gained in popularity) 
more prominently. We found the E2E signal to be more generally 
applicable and less influenced by such effects. 

We further found that the E2D signal has less utility when events 
do not exhibit a clear temporal focus such as long military conflicts 
(compared to birth/death/marriage dates or concert tours). In these 
cases, the E2E signal is helpful in providing additional information 
in the event selection phase. 

4.4 Evaluating Temporal Diversity 

As we mentioned in Section [4^ users strongly dislike when dis¬ 
played events overlap in time, since it is not easy to see the corre¬ 
sponding images and descriptions. Indeed, we see that in 86% of 
the experiments, raters prefer our full model over an ablated ver¬ 
sion, which we call Full-TD, that maximizes relevance and con¬ 
tent diversity but without any temporal constraint (significant at 
p < 0.001). This is despite the fact that the ablated model also 
includes the simple overlap filter we described in Section [4^ The 
number of events we show is controlled and set to the number of 
events in our Full approach, as we aim to measure the impact of 
temporal diversity while controlling for the amount of information 
shown (though the overlap filter may remove some of them). The 
reason the full model works better is that it can take into account the 
temporal overlap during the optimization process, so if one event is 
removed, another (non-overlapping) event can be added instead. 

4.5 Evaluating Content Diversity 

As we mentioned in Section [4^ users strongly dislike when the 
same entity is repeated (with different timestamps), so we always 
remove such cases from all methods. Here, we quantify the impor- 

^Of course, our system treats birth and death dates as special, since 
they in form the beginning and end of the timeline for a person (see 
Section[J77|. 














tance of content diversity in generating timelines. Note that in ad¬ 
dition to entity diversity there are other, slightly more subtle forms 
of content diversity that we might wish to consider. For example, 
we might not want to list only the different movies that an actor 
has been in, even if they all have high relevance scores; instead we 
wish to include awards, TV shows, and personal relationships as 
well. Our submodular set cover objective captures this by using the 
E2EPATH feature, which gives higher score to a set of events with 
distinct path types (see Section [3.3.1| >. 

To evaluate this, we consider an alternative model in which we 
evaluate the score by summing over the multiset (rather than set) of 
related entities (or paths to related entities), allowing for duplicate 
entities or paths during the optimization process; we call this Full- 
CD. We see that raters prefer our full model 91% of the time com¬ 
pared to this ablated model (significant at p < 0.001). Again, we 
attribute this to the fact that the full model is aware of the penalty 
for duplication during the optimization process, and can adjust its 
output appropriately. 

5. RELATED WORK 

There has been much work on extracting temporal events from 
text |17| |25| , and in summarizing large text corpora such as tag 
streams |13) , news corpora GD, Wikipedia biographies Q, and 
Wikipedia edit histories (38) . There has also been work on mining 
temporal patterns across such textual data sets |16[|36[[4T) . 

Another body of related work concerns document summarization 
|2| |34|. The evaluation of summarization approaches has always 
been challenging, and measures like Rouge |23| are often used if 
ground truth summaries are available. In our case, we use paired 
comparisons, since we do not have ground truth. 

The summarization and IR communities have identified diversity 
as an important quality criterion (8] |24|. More recently, research 
has focused on complementing traditional corpus-based relevance 
measures with signals such as social attention (42) . Early work on 
timeline generation by Swan and Allan (37) attempted to summa¬ 
rize a news corpus by displaying major events and topics along a 
timeline. In a similar spirit, Shahaf et al. (H) have created maps of 
information that summarize complex storylines across news docu¬ 
ments. Similar techniques have been applied to scientific literature 
|30||33| . Our paper extends this line of work by using multiple rel¬ 
evance signals (based on web co-occurrence), as well as showing 
that content and temporal diversity are critical for quality timelines. 

Submodular optimization has been shown to be a powerful frame¬ 
work for summarization (18[|24|[^|31|[^ , since it naturally cap¬ 
tures notions of diversity through its diminishing returns proper¬ 
ties (10) . Eurthermore, there are efficient approximation algorithms 
with theoretical guarantees (T|[7l |20[|^|27[|28) . 

Some recent work has focused on generating personalized time¬ 
lines based on Eacebook Q5) or Twitter feeds |22| . Timelines 
generated based on information from KBs have been considered 
in (26[ |39[ [40) ; these papers are the ones most related to our ap¬ 
proach. However, there are several differences. Eirst, |40) do 
not consider a ranking of individual events (required when space 
is limited) nor visual space constraints, so there is no optimization 
algorithm involved. Instead, they simply display all events which 
is not an option in our context as each timeline entity might have 
hundreds of candidate events (see Figure]^. We have empirically 
shown that it is absolutely necessary to address relevance, redun¬ 
dancy, and space constraints to generate quality timelines. Second, 
(39) considers ranking related entities but uses a different notion 
of relatedness (sharing many contexts rather than more direct con¬ 
nections in the KB). In this approach, it is impossible to capture 
relationships between selected events as the ranking is static. To 


the best of our knowledge, this is the only relevant baseline and 
we show in Section [43] that our proposed method outperforms an 
improved reimplementation of this approach. Third, none of these 
papers conducts any quantitative evaluation of their timelines. 

Finally, we should mention that Bing |29) has released a system 
called “Timeline” that is somewhat similar to ours. However, there 
are (to the best of our knowledge) no published accounts of how 
their system works. Furthermore, their timelines are static, and do 
not allow the user to interact with the timeline, a feature which we 
consider to be very important, especially for mobile browsing. 

6. FUTURE WORK 

In this section, we suggest some directions for future work, based 
in part on the comments written by the raters. 

Choosing a better default timespan. As we discussed in Sec¬ 
tion |3.7| the algorithm picks a default time span for a person that 
covers 90% of their generated life events. However, sometimes 
this is suboptimal. For example, consider the US president John 
F Kennedy: many important events occurred in the last few years 
of his life (assassination, presidency, Cuban missile crisis. Bay of 
Pigs invasion, etc.). Our default timespan misses many of these. 
In particular, his assassination, his presidency, and his marriage to 
Jacqueline Kennedy Onassis are chosen first, and these then block 
other important events such as Cuban Missile Crisis or his involve¬ 
ment in the Vietnam war. 

We address this problem by allowing the user to zoom in to the 
appropriate period. Other potential solutions include using the E2D 
scores to weight some time periods higher than others, and includ¬ 
ing the search over suitable time periods as part of the optimization. 

Time points vs intervals. Our algorithm represents events based 
on a single point in time. However, some events (e.g., wars) are 
more naturally associated with intervals. Currently our method 
may pick the start or end of a war, but might not show both, due 
to the diminishing returns property. This could be fixed by modify¬ 
ing the algorithm to reason about temporal intervals. 

Choosing how to describe an event. Sometimes a related entity 
is connected to the subject via many different paths, and all have 
the same timestamp. In this case, it is hard to know which relation 
to show to the user. For example, the system sometimes describes a 
date associated with someone’s death as the end of their marriage; 
while technically true, this is rather unintuitive. Another example 
concerns US presidents: sometimes such people are described as 
being a military commander. Again, while technically true (since 
the US president is also the Chief of the Armed Forces), this is 
unintuitive to users. We may be able to fix this problem by learning 
a ranking model applied to particular candidate values for any given 
subject and relation or by influencing the way the data is curated. 

User preferences and subjectivity. In some cases, raters did not 
agree on which timeline was best. The reason often seems to boil 
down to individual preferences. In our experiments, the biggest 
area of disagreement is over how much the timeline should be fo¬ 
cused on professional life (e.g., jobs, albums, books) vs personal 
life and relationships (e.g., marriage, children). Users had differ¬ 
ent opinions, even for the exact same timeline subjects, which il¬ 
lustrates the need for personalization in this space. One approach 
would be to distinguish between professional and personal events, 
and to allow some trade-off parameter between them. 

Extractive vs abstractive summarization. Our current approach 
to building a timeline is similar to “extractive summarization” tech¬ 
niques in the NLP community, in the sense that we select a set of 
events from a candidate pool. However, sometimes this is subopti¬ 
mal, since the relationship between two entities may be more com¬ 
plex. Eor example, Robert Downey Jr.’s father (Robert Downey Sr.) 




shows up on his timeline, but is described being a co-star in a movie 
rather than being his father. While technically correct, it would be 
more satisfying to create an abstract summarization of the relation¬ 
ship, describing that Robert Downey Sr. is both the father and a 
co-star. We leave this to future work. 

Creating timelines for collections. In the future, we would like 
to go beyond timelines for single entities, and derive a method to 
summarize collections of entities (e.g., 1930s jazz artists), periods 
of time (1920s in the U.S.), or long-lasting events (World War II). 

7. CONCLUSIONS 

We presented a system called TimeMachine for automatic time¬ 
line generation for entities in a knowledge base. The timeline gen¬ 
eration problem is formulated in a submodular optimization frame¬ 
work that jointly optimizes for relevance, content diversity and tem¬ 
poral diversity. Web-based co-occurrence signals are used to de¬ 
termine the relevance of other entities and dates to the timeline 
subject. We proved that an efficient greedy approximation algo¬ 
rithm achieves near-optimal performance. The proposed approach 
is evaluated through a comprehensive series of user studies demon¬ 
strating that both temporal diversity and content diversity are cru¬ 
cial, and that web-based co-occurrence signals significantly im¬ 
prove over a baseline model that relies on global importance. 
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9. APPENDIX 


/ 


9.1 Proof of Approximation Guarantee 

We repeat the definition from Section !?!?) A set T C of events 
satisfies the layout constraint CONSTRAINTS (T, E, W, ui, n) if 

Vf € K, : |m [f, f + f„)| < n 

This can be checked very efficiently because of the following 
equivalent formulation: Let t{Y) = | y £ Y} be the set of 

timestamps for all events in some subset of events Y C E. 

Theorem 9.1. Equivalence of layout constraints 0 0 

VS' C T, |S| > n + 1 : max(r(S)) — min(r(S)) > (4) 

Proof. 1. Show ^ 0 ^ 0: 

3S C r, |S| > n + 1 : max(r(S)) — min(r(S)) < tw 

Let S* be a witness for such a set S, Let t' = min(r(S*)) and 
t” — max(r(S*)) (note that t' + tw > t"). Then with S* = 
S* n [t\ t”) = S* n [V, t' + f„) C T 

|S*n + tw)\ >=n + l 
^\Tn\t\t' Ytw)\ >=n+l 

G R : |T n [t, f + f„)| >= n + 1 i 
2. Show ^ 0 ^ ^ 0 

at G R : |rn [t,t + tw)\ >=n + l 
Let t* be one such t. Define S* = T H \t*,t* +tw). 

as* C T, |S*| > n : max(r(S*)) — min(r(S*)) <tw i 

□ 

In order to show that we can get a good approximation to the 
optimal solution to Equation 0 (see Section [30 through an effi¬ 
cient greedy algorithm we now show that the independence family 
{E,I), where T £ X if T satisfies Equation 0 (T C E), is a 
p-system for p — 2. 

Let Jmax and Jmin be bases ofY X E with i Jmaxl = 
maxj; j is a base of V |V| and I J min I = minj;jisabaseofy I J| (recall 
Section|30. We choose Y X E instead of T C i? in this definition 
to highlight that the p-systems property (Equation 0) has to hold 
for all subsets of events and not only for the timeline subset T. 

Definition 9.1. We say an element a £ E is blocked in indepen¬ 
dent set T (below this will be either Jmin or Jmax) if it could not be 
added to T without violating the layout constraint (Equation 0). 
Formally, this means T £X but T U {a} ^ I. 

Definition 9.2. We define a2tw ball around e G -E as the follow¬ 
ing interval: baU{e) = (T(e) — tw,T{e) + tw). 

Lemma 9.2. If a is blocked in T we have (T fl ball{a)) > n. 
Proof. There must be some interval of size tw containing a that has 
at least n elements, otherwise a would not be blocked and could be 
added to T while remaining an independent set. □ 

Definition 9.3. We define the DeletionStep algorithm as fol¬ 
lows: Input: Jmin, Jmax 
Output. Jmin,'^max 

1. Let b be the minimum element in Jmin. Let Jmin — Jmin \ 
{6}; that is, we delete b. 

2. Let fli,..., o„ be all elements in Jmax FI ball{b) in sorted or¬ 
der (i.e. Ol < 02 < • • • < a„). Let J^ax = -^max \ {oi, 02 }; 
that is, we delete the first two elements (if both elements ex¬ 
ists, otherwise just delete one or zero elements). 


(a) . 

(b) .. X 

(c) ^ ^ 


Jmin ☆ Jmax O ball ' 

Figure 7: Intuition for proof of Lemma [90 (n = 1 for sim- 
piicity). Case (a) is possible while (b) and (c) show impossible 
situations (see proof for details). 

This algorithm is helpful in proving Theorem |3.2| for the follow¬ 
ing reason. If we can delete all elements from Jmax eventually us¬ 
ing multiple iterations of DeletionStep then Jmax can be at most 
twice as large as Jmin (which is exactly what we need to show). 

Definition 9.4. We define the following invariant: Let Jmin and 
Jmax be minimal and maximal bases of Y. 

Va G n ball{a) 7 ^ 0 (5) 

This invariant captures the necessary condition to be able to delete 
every element in Jmax eventually because for every such element 
we have an element in Jmin that could still delete it. Next we show 
that this condition is actually invariant under the previously defined 
DeletionStep. This means that we will be able to delete every 
element in Jmax eventually (used in the proof of Theorem |3.2| l. 

Intuitively, this invariant holds because we can never have an 
element in Jmax which is not covered in the sense of the invari¬ 
ant, since then either ( 1 ) Jmin has too few points to be a maximal 
independent set (base), or ( 2 ) Jmax has too many points to be an in¬ 
dependent set. This intuition is illustrated in Figure)^ (for n = 1). 
Case (a) shows a possible situation in which each element in Jmax 
has an element in Jmin “in range” and both Jmin and Jmax are valid 
bases. Case (b) illustrates an impossible case in which there is no 
corresponding element in Jmin for the rightmost element of Jmax. 
In this case the rightmost element could be added to Jmin while 
remaining an independent set so Jmin cannot be a base. Case (c) 
illustrates a different impossible case in which Jmax has too many 
points to be an independent set. We have n = 1 but there are two 
points in close proximity. 

Lemma 9.3. Let Jmin and Jmax be minimal and maximal bases of 
Y. Let Jmin otid Jmax be the sets obtained from Jmin and Jmax 
after one or more iterations of DeletionStep. The invariant 
holds initially and after any iteration of DeletionStep. 

Proof. Invariant holds initially. 

Let a £ Jmax. Initially, Jmin H ball{a) 7 ^ 0 since otherwise a 
could be added to Jmin (a not blocked in Jmin) for a contradiction 
with Jmin maximal independent set or base (cf. Figure|7|(b)). ^ 
Invariant holds after any number of DeletionStep. 

We now show by contradiction that the invariant holds after one 
or more iterations of DeletionStep. Let Jmin and Jmax be the 
bases before the DeletionStep(s), and let Jmin and Jmax bs the 
bases afterwards. Assume the invariant holds before but not after. 
Let a* G Jmax be an element for which the invariant is now vio¬ 
lated, so Jmin n ball{a*) = 0. We do a case analysis on a*. 

Case I: a* is not blocked in Jmin 
If a* is not blocked in Jmin we could add it to the base and obtain 
an independent set which is in conhadiction with Jmin being a base 
(cf. Figure0(b)); same as in the initial case above), .f 













Case 2: a* is blocked in Jmin 

Element a* can only be blocked in Jmin if Jmin H baU{a*) > n. 
Since we assume the invariant does not hold for the new bases, 
we have J^in n ball{a*) = 0, so we must have deleted at least 
n elements from Jmin in previous iterations of DeletionStep. 
However, we now show this results in a contradiction. 

We know element a* has not been deleted yet. All of the n 
or more elements in Jmin n ball{a*) would have in “in range” to 
delete a* G Jmax. Therefore, we must have deleted at least 2n 
elements from Jmax previously. These elements must have been in 
(a* — 2 tni,a* + 2iu,), the range covered by the n or more elements 
in Jmin n ball{a*). 

Note that we delete elements in Jmax in ascending order. Since 
a* was not deleted yet, all deleted elements must be smaller than or 
equal to a*, i.e. at least 2n elements from Jmax are actually all in 
(a* — 2tw, a*]. Together with a* we have at least 2n + l elements 
in Jmax n (a* — 2 tw,a*], an interval of length smaller than 2 tni- 
This contradicts with Jmax being an independent set or base since 
our layout constraint in Eq. 0 dictates that we cannot have more 
than 2n elements in an interval of that length (cf. Eigure|^(c)). h 

Therefore, the invariant has to hold initially and after any number 
of iterations of DeletionStep. □ 


Theorem |3.2| Let {E, I) be an independence family based on our 
layout constraint where T G X if T satisfies Equation ^ {T C E). 
{E,X) forms ap-system forp = 2. 

Proof. From Equation[T] we need to show that Jmax|/| Jmin| < 2. 

From Lemma [93] we know that the invariant holds initially. We 
now recursively apply DeletionStep. In each step we delete one 
element from Jmin and up to two elements from Jmax. In the end 
we have Jmfn = 0 and | Jmax I > | Jmax| - 21 Jmin I. 

From Lemma [93] we know that the invariant still holds after any 
number of iterations. Therefore, Jmfn = 0 implies Jmfx = 0 and 
we get I Jmfxl = 0 > I Jmax I “ 2| Jmin|- Because all this holds 
for arbitrary minimal and maximal bases Jmin and Jmax this is 
equivalent to our proposition: | Jmax|/| Jmin| <2. □ 


This shows that we can obtain a close-to-optimal solution to 
Equation using an efficient greedy algorithm (see Section [Th^. 


9.2 Implementation Details of the Candidate 
Generation Step 

The KB used in this paper, Freebase, uses Compound Value Type 
(CVT) nodes as a way to represent n-ary relations in triple form. (Jn 
the semantic web community, CVTs are known as “blank nodes”[j 
Such CVTs reify the n-ary relation itself as a node; each property 
of this CVT node corresponds to one of the slots in the n-ary rela¬ 
tion. For example, the event that Robert Downey Jr. starred in The 
Avengers while playing the role of Iron Man is represented by the 
following triples: 


Imj Robert 
CVT 
CVT 


/ film f actor f film 
I film j performance! character 
j film j per f ormance /film 


CVT 

/m/IronMan 
Imj Avengers 


This representation makes the movie more than one hop away from 
the actor, and therefore makes the process outlined above a bit more 
complex. To address this, we “collapse” the CVT nodes hy replac¬ 
ing each path a 2-4 CVT -24 6 by a single edge a b. 

The second issue that CVTs raise can be illustrated through the 
following example. When a musician plays for a band, this event 

See http://en.wikipedia.org/wiki/Blank_node 


is represented by a CVT, which captures the role he played (e.g., 
singer or drummer), the name of the band, and the date he joined. 
If two musicians play for the same hand, their corresponding CVTs 
will have different IDs. This will break the above compound event 
heuristic. We solve this issue by replacing the CVT ID by the ID of 
the corresponding band. More generally, we replace a CVT ID by 
following the predicate that leads to the most diverse set of entities 
in the knowledge base (e.g., we use the hand ID not the less diverse 
role ID). Formally, for each “incoming” predicate pi we choose the 
“outgoing” predicate P 2 such that 

pUpi) — argminmax |{6 | 3a, CVT : e = a -24 CVT -24 6}|. 

P2 
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